A Framework for Distributed Cleaning of Data Streams
نویسندگان
چکیده
منابع مشابه
QueueLinker: A Framework for Parallel Distributed Processing of Data Streams
With the development of computer systems, many more devices are being connected to the network and generating ‘data stream.’ Analyzing data streams in real-time offers valuable information about human activities and contributes to many information services. QueueLinker enables programmers to build data stream processing applications by implementing application modules that use a producer–consum...
متن کاملA Framework for Clustering Evolving Data Streams
The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem, they are generally blind...
متن کاملA New Framework for Data Streams Classification
Mining data streams has recently become an important and challenging task for a wide range of services, including credit card fraud detection, sensor networks and web applications. In these applications data do not typically take the form of persistent relations, but tend to arrive in multiple, continuous, rapid and timevarying data streams. Hence, conventional knowledge discovery tools cannot ...
متن کاملA Framework for Data Cleaning in Data Warehouses
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when perfor...
متن کاملModels for Distributed, Large Scale Data Cleaning
Poor data quality is a serious and costly problem affecting organizations across all industries. Real data is often dirty, containing missing, erroneous, incomplete, and duplicate values. Declarative data cleaning techniques have been proposed to resolve some of these underlying errors by identifying the inconsistencies and proposing updates to the data. However, much of this work has focused o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2015
ISSN: 1877-0509
DOI: 10.1016/j.procs.2015.05.156